The particle filter returns a Monte-Carlo estimate of the log-likelihood and, as every Monte-carlo estimate, its precision depends on the number of particles.
If you have too few particles then you will have a highly variable estimate of the log-likelihood and this will make the exploration of the likelihood surface quite imprecise. In addition, you might experience particle depletion (if you don’t know what that means just try to run a particle filter with a single particle).
If you have too many particles, then you will have an accurate estimate of your log-likelihood but it will be very time consuming so inefficient in practice.
In addition, the variability of the estimate might also depend on the region of the parameter space. For instance, in some region you might experience significant variability with 100 particles whereas in another region it might be fairly stable.
You can now return to the practical and try to think on a way to calibrate the number of particles.
Ideally we want enough particles to obtain a fairly stable estimate of the log-likelihood in a reasonable computational time.
A simple calibration approach consists in plotting the mean, standard deviation and computational time of the log-likelihood estimate as a function of the number of particles. Although several parameters can be tested, we will use a theta close to the mean posterior estimate of the deterministic fit since it is likely to be the region of the parameter space we want to explore with the pMCMC.
# load fitmodel, data and define init.state
data(SEIT4L_stoch)
data(FluTdC1971)
init.state <- c("S" = 279,"E" = 0,"I" = 2,"T1" = 3,"T2" = 0, "T3" = 0, "T4" = 0,"L" = 0,"Inc" = 0)
# pick a theta close to the mean posterior estimate of the deterministic fit
theta <- c("R0" = 7, "D_lat" = 1 , "D_inf" = 4, "alpha" = 0.5, "D_imm" = 10, "rho" = 0.65)
# vector of number of particles to test
test.n.particles <- seq(50, 1000, 50)
# number of replicates
n.replicates <- 100
# vector and data frame of results
sample.log.like <- vector("numeric", length = n.replicates)
res <- data.frame()
for(n.particles in test.n.particles){
# start measuring time
start.time <- Sys.time()
for(i in 1:n.replicates){
cat("Testing ", i, " particles\n")
# one Monte-Carlo estimate of the log-likelihood
sample.log.like[i] <- my_particleFilter(SEIT4L_stoch, theta,
init.state, FluTdC1971,
n.particles)
}
# end measuring time
end.time <- Sys.time()
# keep only replicate with finite log-likelihood to be able to compute the mean and sd
# this give us the proportion of replicates with particle depletion.
sample.finite.log.like <- sample.log.like[is.finite(sample.log.like)]
ans <- c(mean = mean(sample.finite.log.like),
sd = sd(sample.finite.log.like),
prop.depletion = 1-length(sample.finite.log.like)/length(sample.log.like),
time = end.time - start.time)
res <- rbind(res, t(ans))
}We ran this calibration algorithm and obtained the following results:
The mean and standard error of the log-likelihood estimate seem to stabilize around 400 particles (red line). By contrast, the computational time increases linearly with the number of particles. Using 400 particles looks optimal if we want a stable estimator while minimizing the computational time. Note however that it will take around 6 days to run 10000 times the particle filter with 400 particles. Since every step of the pMCMC requires to run a particle filter, that means that it will take 6 days to run 10000 iterations of the pMCMC. This is why you generally need a scientific computing cluster to run a pMCMC.
You can now return to the practical and set your pMCMC with 400 particles.
# the fitmodel
data(SEIT4L_stoch)
# wrapper for posterior
my_posteriorSto <- function(theta){
my_fitmodel <- SEIT4L_stoch
my_init.state <- c("S" = 279,"E" = 0,"I" = 2,"T1" = 3,"T2" = 0, "T3" = 0, "T4" = 0,"L" = 0,"Inc" = 0)
my_n.particles <- 400
# you can reduce the number of particles if your pMCMC is too slow
return(logPosterior(fitmodel = my_fitmodel,
theta = theta,
init.state = my_init.state,
data = FluTdC1971,
margLogLike = my_particleFilter,
n.particles = my_n.particles))
}
# load results of deterministic fit
data(mcmc_TdC_deter_longRun)
# Let's use the first trace only, no need to burn or thin
trace <- mcmc_SEITL_infoPrior_theta1$trace
# we will start the pMCMC at the mean posterior estimate
# of the deterministic fit
init.theta <- colMeans(trace[SEIT4L_stoch$theta.names])
# and we take the empirical covariance matrix for the
# Gaussian kernel proposal
covmat <- mcmc_SEITL_infoPrior_theta1$covmat.empirical
# lower and upper limits of each parameter
lower <- c(R0 = 0, D_lat = 0 , D_inf = 0, alpha = 0, D_imm = 0, rho = 0)
upper <- c(R0 = Inf, D_lat = Inf , D_inf = Inf, alpha = 1, D_imm = Inf, rho = 1)
# number of iterations for the MCMC
n.iterations <- 50 # just a few since it takes quite a lot of time
# Here we don't adapt so that we can check the acceptance rate of the empirical covariance matrix
adapt.size.start <- 100
adapt.size.cooling <- 0.99
adapt.shape.start <- 100You can now go back to the practical and try to run pMCMC with those settings.
Here is an example of analysis of our 5 chains of 3000 iterations with 50 particles.
# load traces
data(pmcmc_SEIT4L_infoPrior_n50)
# combine into a `mcmc.list` object
library("coda")
trace <- mcmc.list(lapply(pmcmc_SEIT4L_infoPrior_n50, function(chain) {
mcmc(chain$trace)
}))
# acceptance rate is below the optimal 23%
1 - rejectionRate(trace)
## R0 D_lat D_inf alpha D_imm
## 0.1272424 0.1272424 0.1272424 0.1272424 0.1272424
## rho log.prior log.likelihood log.posterior
## 0.1271757 0.1272424 0.1272424 0.1272424
# accordingly, the combined ESS is a bit low
effectiveSize(trace)
## R0 D_lat D_inf alpha D_imm
## 110.8391 254.4142 176.0739 135.6087 198.9908
## rho log.prior log.likelihood log.posterior
## 220.5789 281.3533 264.5747 260.7013
# Let's have a look at the traces
library("lattice")
xyplot(trace)The burn-in period looks relatively short, this is because we started the chains with a “good” init.theta. That said, since we have relatively short runs it might be useful to carefully choose the burn-in period with plotESSBurn.
# Actually, it looks like no burn-in is needed What about autocorrelation
acfplot(x = trace, lag.max = 50)
# There is substantial autocorrelation but we can't thin too much since the
# chains are quite short. So let's keep 1 iteration every 20
trace.thin.n50 <- burnAndThin(trace, thin = 20)
# Finally we can plot the posterior density
densityplot(x = trace.thin.n50)The 5 chains converged to somewhat different posterior distributions, with different shapes and modes. This is particularly evident for the posterior of \(R_0\), which looks bimodal. In addition, the ESS is too small to have smooth posterior distributions.
With so few particles, the likelihood estimate is very noisy and the exploration of the parameter space is not efficient. This is why we have a lot of autocorrelation and small ESSs. That said, the theoretical properties of the pMCMC guaranty that the chain will converge to the true posterior, even with 1 particle. Of course, this will take a lot of iterations so in practice it might be more efficient to spend more time computing the likelihood (i.e. having more particles) in order to reduce the number of iterations.
You can now return to the practical and analyse a pMCMC with much more particles that we ran for you.
Here is an example of analysis of our 5 chains of 3000 iterations with 400 particles.
# load traces
data(pmcmc_SEIT4L_infoPrior_n400)
# combine into a `mcmc.list` object
library("coda")
trace <- mcmc.list(lapply(pmcmc_SEIT4L_infoPrior_n400, function(chain) {
mcmc(chain$trace)
}))
# acceptance rate is near optimal
1 - rejectionRate(trace)
## R0 D_lat D_inf alpha D_imm
## 0.2485495 0.2485495 0.2485495 0.2485495 0.2485495
## rho log.prior log.likelihood log.posterior
## 0.2485495 0.2485495 0.2485495 0.2484828
# Note that the combined ESS is 2 times higher than with 50 particles
effectiveSize(trace)
## R0 D_lat D_inf alpha D_imm
## 620.3864 634.7383 619.7785 615.5586 621.1325
## rho log.prior log.likelihood log.posterior
## 598.9802 808.5497 597.5648 587.0809
# Let's have a look at the traces
library("lattice")
xyplot(trace)As in the analysis with 50 particles, the burn-in period is relatively short. However, with 400 particles the chains mix much better. Here again we decide to carefully choose the burn-in period with plotESSBurn.
# Autocorrelation decreases much more quickly than with 50 particles Let's
# keep 1 iteration every 20
trace.thin.n400 <- burnAndThin(trace, thin = 20)
# Let's plot the posterior densities
densityplot(x = trace.thin.n400)All 5 chains seems to have converged to the same posterior, which are smoother than with 50 particles. Let’s compare the combined posterior densities with that obtained with 50 particles
Although the posterior distributions are similar, those with 400 particles are smoother and more representative thanks to higher ESS. Note the different location of the mode of \(R_0\), which is shifted to the left with 50 particles. This is because 1 of the 5 chains with 50 particles shows a posterior with much lower \(R_0\) (see figure above), whereas the remaining 4 seems to have converged to the same distribution as the pMCMC with 400 particles.
Finally, note that the log-likelihood is overestimated with 50 particles, which can be problematic for model selection as we would overestimate the fit of the model.
Overall, this analysis confirms that the pMCMC works even with 50 particles but that it will require much more iterations to achieve the same posterior as the pMCMC with 400 particles. Although the latter takes more time at each iteration, it provides more better samples on short-runs. A good strategy is therefore to run many short chains in parallel with 400 particles. The chains start at different init.theta near the mode of the deterministic posterior, and are then combined to increase the overall ESS.
You can now return to the practical and proceed to the last section of this session.
Here we compare the combined traces of the deterministic SEIT4L model (2 chains of \(10^5\) iterations) with those obtained with the stochastic version (5 chains of \(3000\) iterations). Both analysis have assumed informative priors for \(D_\mathrm{lat}\) and \(D_\mathrm{inf}\) .
# load, burn and thin the deterministic fit
# create mcmc object
library("coda")
trace1 <- mcmc(mcmc_SEIT4L_infoPrior_theta1$trace)
trace2 <- mcmc(mcmc_SEIT4L_infoPrior_theta2$trace)
# combine in a mcmc.list
trace <- mcmc.list(trace1, trace2)
# burn and thin as the chain with uniform prior (see above sections)
trace.deter <- burnAndThin(trace, burn = 5000, thin = 40)
# compare posterior density
plotPosteriorDensity(list(deter = trace.deter, sto = trace.thin.n400))Overall, the posterior distributions are quite different. This is especially true for \(R_0\) and \(D_{imm}\). In addition, the discrepancy in the posterior distribution of the log-likelihood seems to indicate that the stochastic model fits much better. We can quantify this by computing the DIC of the stochastic SEIT4L model.
# combine all traces in a data frame
library("plyr")
trace.combined <- ldply(trace.thin.n400)
# take the mean of theta
theta.bar <- colMeans(trace.combined[SEIT4L_stoch$theta.names])
print(theta.bar)
## R0 D_lat D_inf alpha D_imm rho
## 8.8904999 1.6551303 2.5233450 0.4776501 12.2714425 0.6649321
# compute its log-likelihood
init.state <- c(S = 279, E = 0, I = 2, T1 = 3, T2 = 0, T3 = 0, T4 = 0, L = 0,
Inc = 0)
log.like.theta.bar <- my_particleFilter(SEIT4L_stoch, theta.bar, init.state,
data = FluTdC1971, n.particles = 400)
print(log.like.theta.bar)
## [1] -115.3378
log.like.theta.bar.deter <- dTrajObs(SEIT4L_deter, theta.bar, init.state, data = FluTdC1971,
log = TRUE)
print(log.like.theta.bar.deter)
## [1] -163.6574
# and its deviance
D.theta.bar <- -2 * log.like.theta.bar
print(D.theta.bar)
## [1] 230.6756
# the effective number of parameters
p.D <- var(-2 * trace.combined$log.likelihood)/2
print(p.D)
## [1] 5.293693
# and finally the DIC
DIC <- D.theta.bar + 2 * p.D
print(DIC)
## [1] 241.2629In the previous session, we found that the DIC of the deterministic SEIT4L model was equal to 265. The difference of 20 indicates that the stochastic model should strongly be preferred to the deterministic model.
We can visually check this result by plotting the posterior fit of each model:
# take the mean posterior estimates of the deterministic model
x <- summary(trace.deter)
theta.bar.deter <- x$statistics[SEIT4L_deter$theta.names, "Mean"]
plotFit(SEIT4L_stoch, theta.bar, init.state, data = FluTdC1971, n.replicates = 1000)Despite the fact that the deterministic model seems to better capture the first peak of the epidemic, the stochastic model better explains the variability of the observed time-series. In particular, the 95% CI of the stochastic model captures almost all the observed data points, even during the first peak.
You’ve already finished? Why not go further?